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The promise of discovering a functional blueprint of a cellular system from large-scale and high- 
throughput sequence and experimental data is predicated on the belief that the same top-down 
investigative approach that proved successful in other biological problems (e.g. DNA sequencing) 
will be as effective when it comes to inferring more complex intracellular processes. The results in 
this paper address this fundamental issue in the specific context of transcription regulatory networks. 
In particular, we consider a recently introduced experimental technique, the genome-wide location 
analysis for DNA-binding regulators, which allows the construction of network topologies relating 
transcriptional regulators with all DNA promoter regions they are capable of interacting with. Al- 
though simple recurring regulatory motifs have been identified in the past, due to the size and 
complexity of the connectivity structure, the subdivision of such networks into larger, and possibly 
inter-connected, regulatory modules is still under investigation. Specifically, it is unclear whether 
functionally well-characterized transcriptional sub-networks can be identified by solely analyzing the 
connectivity structure of the overall network topology. In this paper, we show that transcriptional 
regulatory networks can be systematically partitioned into communities whose members are consis- 
tently functionally related. We applied the partitioning method to the transcriptional regulatory 
networks of the yeast Saccharomyces cerevisiae; the resulting communities of gene and transcrip- 
tional regulators can be associated to distinct functional units, such as amino acid metabolism, 
cell cycle regulation, protein biosynthesis and localization, DNA replication and maintenance, lipid 
catabolism, stress response and so on. Moreover, the observation of inter-community connectiv- 
ity patterns provides a valuable tool for elucidating the inter-dependency between the discovered 
regulatory modules. 



I. INTRODUCTION 



A. Motivation and Background: Bottom-Up Vs. 
Top-Down 

We address one of the primary goals of systems bi- 
ology: Discovering a functional blueprint of a cellular 
system from large-scale and high-throughput sequence 
and experimental data. Such a blueprint would describe 
how the different components (e.g., genes, proteins, sig- 
nahng molecules etc.) work together to perform vari- 
ous tasks in the cell. A wealth of information, obtained 
through decades of ingenious but painstaking investiga- 
tions by biochemists and biologists, have helped eluci- 
date many aspects and components of the complex func- 
tional organization in different types of cells and organ- 
isms. This investigative approach can be broadly de- 
scribed as a bottom-up approach, where several smaller 
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components and systems are first modeled under care- 
fully constrained conditions; larger systems with increas- 
ing complexity, and comprising well-studied smaller com- 
ponents, are then characterized in subsequent steps. 

In a sharp contrast to this established approach, re- 
cently introduced high-throughput experimental tech- 
niques hold the promise of enabling a top-down in- 
vestigation. The whole-genome shotgun sequencing 
method 1] 2], where thousands of short strands of DNA 
are sequenced in parallel and then pieced together in a 
post-processing computational step to reconstruct a com- 
plete genome sequence, provides a good example of the 
few successes that have fueled high expectations. In or- 
der to extend this trend to functional investigations of 
cellular systems, new experimental and analysis tools 
are being designed. Typically, the simultaneous average 
activity or interaction levels of thousands of indicator 
molecules and agents (e.g., genes, proteins, and signal- 
ing molecules) are observed or tracked in a population of 
cells that have been subjected to different conditions, or 
have been otherwise manipulated with. Whole-genome 
DNA microarray assays, which can estimate the expres- 
sion levels of thousands of genes, constitute a prime ex- 
ample of such a technology. These observed profiles are 
then processed, and statistically significant dependencies, 
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correlations, and other structural relationships inherent 
in the data sets are determined. These inherent depen- 
dencies among the observed agents are then expected to 
yield working hypotheses about functional relationships 
among them; the resulting hypotheses can then be in- 
vestigated further using tailored experiments to obtain 
a more detailed description of the functional blocks and 
mechanisms. 

This top-down approach, however, has yet to prove 
its usefulness when it comes to inferring intracellular 
mechanisms and processes. A number of studies have 
attempted to combine sequence information and experi- 
mental data and have devised methods for determining 
potential functional blocks or hidden regulatory mech- 
anisms 0, 0, IE IE I3- For example, partitioning of 
genes into clusters (based on similarity of their activities 
or profiles and related sequence information) could lead 
one to hypothesize that genes in the same cluster are part 
of the same pathway, or that their profiles constitute a 
signature of a particular functionality in the cell. Such 
advances notwithstanding, basic questions concerning the 
power and usefulness of these large-scale approaches are 
yet to be resolved. First, given the complexity of cellu- 
lar processes, the number of agents/outputs tracked in 
any large-scale study comprises only a small fraction of 
all the participating factors or agents. Second, the final 
data sets are the outputs of a highly complex regulatory 
process involving interactions among a large number of 
bio-molecules that operate at different stages and differ- 
ent parts of the cell. Clearly, such considerations lead to 
several basic questions: Is there enough information in 
these data sets to be able to formulate sufficiently many 
significant hypotheses, which would ultimately lead to a 
detailed reconstruction of functional blocks? If not, then 
how much prior knowledge would one need to incorporate 
before one has enough information^ 



B. Approach and a Preview of Results 

The results in this paper address the above-mentioned 
fundamental issues in the specific context of transcription 
regulatory networks. In particular, we consider a recently 
introduced experimental technique, the genome-wide 
location analysis of DNA-binding regulators |^, which 
allows one to construct an interaction network between 
regulatory proteins (also referred to as transcription 
regulators or factors) and genes. The experiment 
relates any given transcriptional regulator with all 
DNA promoter regions they are capable of interacting 
with. Typically, the resulting networks involve several 
thousands of nodes (i.e., hundreds of regulators and 
thousands of genes), and an even larger number of 
edges, each representing a physical interaction; that is, 
a node representing a regulator is connected to a node 
representing a gene by an edge, if the corresponding pro- 
tein binds to the promoter region of the corresponding 
gene with a high confidence level ,26.J. While the design 



of these large-scale experiments need genome sequence 
information (e.g., prior knowledge about candidate 
regulatory proteins, and the location of genes and their 
corresponding promoter regions in the genome) , no prior 
knowledge about any functionality of the genes or the 
regulators is needed. The functionality-blind design of 
these experiments makes the inferred transcriptional 
regulatory networks good candidates for answering some 
of the basic questions raised in the preceding paragraph: 

Can functionally well characterized transcriptional 
sub-networks be identified by solely analyzing the con- 
nectivity structure of the overall network topology? In 
particular, we aim at establishing whether partial or 
complete cell pathways can be automatically recognized 
by a method that relies exclusively on the interaction 
network, with no other prior knowledge about the 
specific organism under study. 

In our analysis, we applied the Girvan and New- 
man (GN) community partitioning algorithm to the 
transcriptional regulatory networks of the yeast Saccha- 
romyces cerevisiae (S.cerevisiae) (details on the data 
are provided in the Results section). This partitioning 
method relies purely on the amount of information held 
within the connectivity structure of the transcriptional 
network, and does not require the introduction of specific 
parameters describing the modules sought after. The 
GN algorithm returns a nested set of "communities" or 
"groups" of genes in the form of a tree structure, where 
a community is characterized by the fact that nodes 
within the community are densely inter-connected, while 
connections are significantly sparser between members 
of different communities. For example, the transcription 
network of S.cerevisiae is divided into 15 different 
communities at the first level. These communities are 
then subdivided into further communities in a nested 
fashion. The depth of the resulting tree decomposition 
and the total number of communities are determined 
by a parameter called the " modularity index" , and its 
definition and issues related to its choice are discussed 
in more detail in later sections. 

In a top-down approach, each such community would 
comprise a set of hypotheses, which would then need to 
be tested using further experiments involving the genes 
belonging to the same community, or exploiting homol- 
ogy with related organisms. This approach, however, can 
succeed only if these structure-based communities have 
coherent functional themes. Since S.cerevisiae is a fairly 
well studied organism, we can obviate the need for experi- 
mentation and verify whether the communities represent 
functional blocks based on the information in existing 
literature. Toward this end, we performed an automated 
search of the Gene Ontology (GO) databas e'lO], and ob- 
tained a functional annotation of the communities, along 
with their significance levels. 

When the communities are tagged with their corre- 
sponding statistically-significant functional annotations. 
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it results in a remarkable functional blue-print of the cell 
(See Fig. At the bottom level of the nested organi- 
zational architecture, genes and regulators are grouped 
into homogeneous modules performing basic functionali- 
ties; these basic functional modules are in turn organized 
into a hierarchy that capture progressively higher levels 
of functionality culminating into high-level cellular op- 
erational blocks, such as amino acid metabolism, cell cy- 
cle regulation, protein biosynthesis and localization, DNA 
replication and maintenance, lipid catabolism, stress re- 
sponse and so on. In addition, the patterns of inter- 
community edges provide important insights into how 
large regulatory modules are interconnected with each 
other and how they might coordinate their activities. 

C. Potential Implications 

There are several potential implications of the study 
reported here: 

• The promise of Top-Down Approach: Our results 
show that the still-unproven top-down approach 
has considerable merit, at least, in the case of tran- 
scriptional regulatory networks. The structural 
features of the analyzed networks seem to have 
significant functional implications, as verified us- 
ing the GO database. We must note, however, 
that the evaluation of the functional significance 
of the partitioning procedure is limited by the fact 
that current knowledge about cellular systems is in- 
complete. For example, the network contains sev- 
eral hundred genes whose functionalities are un- 
known. Moreover, even for genes whose functional 
descriptions are found in the GO database, it might 
be missing key regulatory interactions that are 
nonetheless captured in the experimentally inferred 
network. Thus, the communities of genes and reg- 
ulators and their connectivity structure might hold 
much more functional information than what is re- 
vealed by the GO database. For example, we are 
currently investigating the differences and similari- 
ties in the topological characteristics of the different 
functional blocks. 

• Tracking Context-Sensitive Reorganization of 
Functional Blocks: One can now apply our com- 
munity partitioning analysis on the regulatory 
networks derived for the same organism but 
when cells are subjected to different conditions. 
Regulatory networks obtained from such exper- 
iments have been already reported by the same 
group at MIT for S . cerevisiae, and they have 
also reported some of the changes in the 
network connectivity as a result of varying the 
environmental conditions, e.g., the genes regulated 
by certain regulators change considerably from 
one condition to another, and thus making fairly 
significant changes in the regulatory networks. 



We are currently studying the changes in the 
community structure as the conditions are varied. 
This could lead to a better understanding of how 
the functional blocks get reorganized and how 
different blocks merge or get split as the organism 
reacts to different conditions. 

• Comparisons Across Organisms: Our results sug- 
gest a systematic means for exploring both the 
functional organization of an unstudied organism, 
and for comparing the community structures across 
organisms. For example, one could study how the 
communities and their relationships change from 
species to species. This could elucidate different 
organizations of functional blocks and their diver- 
sity and any evolutionary footprint that might be 
gleaned from analyzing the regulatory modules. 
Similarly, for an organism that has not been stud- 
ied, a genome wide location analysis could be used 
to obtain its regulatory network. The communities 
in the regulatory networks can then be investigated 
for functional significance, using known instances 
of structure- vs-functional relationships observed in 
other organisms. This could lead to an automated 
and a faster means for deciphering salient charac- 
teristics and distinguishing features of the unstud- 
ied organism. 

• A Pedagogical Shift? Biochemistry text books have 
mostly followed the lead of the previously-discussed 
bottom-up approach to exploring cellular systems. 
Perhaps, an equally useful alternative would be to 
use large-scale networks, as obtained from high- 
throughput experiments, as guides to naturally un- 
fold a detailed description of the organization and 
architecture of cellular systems. Figs. 1 and 2, em- 
bellished with more detailed annotations, seem to 
be good candidates for what might be an introduc- 
tory chapter of a systems biology textbook, and a 
guide to how the different chapters (e.g., each cor- 
responding to a community) might be organized 
and interlinked. Recall that these annotated fig- 
ures were generated from a large-scale transcrip- 
tional regulatory network in an automated fashion. 
Such a network-based exposition provides a multi- 
dimensional view of the system, capturing the dif- 
ferent functional blocks at different scales and in 
different functional relationships with others. 

D. Notes on Previous Work 

Gene networks, in general, and transcriptional regu- 
latory networks, in particular, have been studied for a 
while. For example, several large-scale properties of tran- 
scription regulatory networks (including the particular 
network used in our study) have been thoroughly inves- 
tigated. It has been shown ^^^^^ out-degree 
(the number of genes regulated by each factor) typically 
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FIG. 1: Tree structure organization of the functional modules obtained by partitioning the transcriptional regulatory network 
of S.cerevtsiae (only the top two levels of partitioning are shown). Details on the characteristics of each regulatory sub-network 
are provided in the Results section. 



follows a power law, while the in-degree distribution (the 
number of regulators affecting the promoter region of 
each gene) has an exponential decay, thus demonstrating 
that these networks share several properties with scale- 
free topologies In a separate effort, characteristic 
regulatory motifs have been identified |l5j |. by compar- 
ing their frequency of occurrence with that of randomly 
generated topologies having the same large scale prop- 
erties of transcriptional networks Typically, these 
motifs involve a very small number of nodes (2-^10) and 
are organized in a finite set of simple structures, such 
as single- input or multi- input regulatory modules, feed- 
forward loops and so on. These structures are very closely 
related to well-known transcriptional regulation units, 
such as operons and regulons. However, due to the size 
and complexity of the connectivity structure, the subdi- 
vision of such networks into large, and frequently inter- 
connected, regulatory modules was not addressed. 

As for inferring functional blocks, several methodolo- 
gies have been devised that combine different types of 
high-throughput data sources in order to build function- 
ally coherent modules of genes and transcriptional regu- 
lators 13 . For example in (l^ , an algorithm is described 



that combines information from genome-wide location 
analysis and expression data sets in order to identify reg- 
ulatory networks of gene modules, where the latter are 
defined as a set of genes which are both co-expressed 
and also share a common set of transcriptional factors 
that are known to bind to their promoter regions. An 
approach that assigns genes to context-dependent and 
potentially overlapping "transcription modules" , is de- 
scribed in 0. The method clusters co-regulated groups 
of genes based on their expression levels measured under 
specific experimental conditions. In j^, the authors in- 
troduce a method which starting from a gene expression 
data set and a pre-compiled set of candidate transcrip- 
tional regulators, simultaneously identifies a partition of 
genes into modules, as well as a regulatory program, i.e., 
a set of rules that explains the expression behavior of 
the members of each module. Finally, an example of an 
approach based on integrating the analysis of common se- 
quence motifs in genes' promoter regions with expression 
level data is described in [l^ . 

Lastly, we note that topology-based community find- 
ing techniques have been applied to biological data in at 
least a couple of examples. The first is the analysis of 
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protein-protein interaction databases |20j. The second 
is the organization of hterature data, where networks of 
gene co-occurrences are extracted by parsing the abstract 
of scientific articles covering a specific topic '2l'|. How- 
ever, the fact that the topology partitioning algorithms 
are capable of identifying well-defined functional units, 
as shown in our paper, is to our knowledge, the first 
compelling evidence of a significant association between 
structure and function in cellular networks. 



E. Background on Community Finding Algorithms 

Network topologies that are rich in structure have been 
studied in several non-biological fields. One example is 
given by Internet browsing patterns, which can be effec- 
tively represented as directed graphs linking users with 
the sites they tend to visit most frequently. Another 
example is represented by peer-to-peer networks, where 
computer users are connected either by a file sharing ar- 
chitecture or, through a social network type of infrastruc- 
ture. In both cases, methods have been devised that are 
capable of automatically partitioning the nodes in the 
network into groups or "communities". Typically, what 
characterizes a community is the fact that nodes within 
the community itself are densely inter-connected, while 
such connections tend to be sparser between members of 
different communities. 

Several examples of network partitioning algorithms 
are described in the literature [2j|. Among them are 
spectral bisection, the Kernighan Lin algorithm, and the 
Girvan and Newman's algorithm "5] "23], just to cite a 
few. The latter is of particular interest having found 
application to several different types of networks, such 
as scientific collaboration networks, social netwo rks, and 
the World Wide Web, with successful outcomes j2J|. As 
we note in the Discussion section, the GN algorithm suf- 
fers from several drawbacks, and we are pursuing more 
flexible community partitioning approaches, which will 
be better equipped to capture the complexity of cellular 
systems. 



II. METHODS 

The transcriptional network topology was derived from 
a whole-genome binding site location analysis of the yeast 
Saccharomyces Cerevisiae The experimental proce- 
dure identifies the binding affinities between a set of 203 
transcriptional regulators and the yeast DNA, under dif- 
ferent experimental conditions. With a high confidence 
level (P < 0.001) the promoter regions of a total of 2,845 
genes were identified as targets of regulation, for a total 
of 6,170 regulatory interactions, in rich media conditions. 

We analyzed the resulting network topology by using 
the faster implementation of Girvan and Newman ;25] al- 
gorithm, which is particularly suitable for handling large 
networks involving several thousands of nodes. The al- 



gorithm is based on the idea of successively removing 
edges with the highest degree of "betweenness" until a 
final partitioning of the nodes is obtained. The degree of 
"betweenness" measures the likelihood that a particular 
edge lies between two separate communities: in |9j this 
is calculated by finding the shortest path between any 
two nodes in the network and counting the frequency 
with which each edge is traversed. The edges that are 
traversed more often are likely to be interconnecting sep- 
arate communities in the network. To determine the op- 
timal number of edges to be removed, the algorithm relies 
on the notion of modularity qW\, which provides a mea- 
sure of the fraction of within-community edges minus the 
expected value of the same quantity in a network with 
the same community partitioning but random connec- 
tions between its nodes. 

Values of Q above 0.3 have been suggested as mean- 
ingful for identifying significant partitions. In order to 
choose a statistically significant level of the modular- 
ity index, we ran the community finding algorithm on 
1,000 randomly generated topologies, characterized by 
the same connectivity pattern as the transcriptional net- 
work of S. cerevisiae, but with the edges assigned at ran- 
dom. We found that a modularity threshold of Q = 0.5 
is sufficient to guarantee that the partitioned structures 
are significant and not simply due to general large-scale 
properties of the network (for Q = 0.5 no partitioning is 
found in any of the randomly generated topologies). 

Once a top level partitioning is achieved, each of the re- 
sulting communities is evaluated for further subdivision. 
Because of the scale-free nature of this type of networks, 
communities tend to considerably vary in size (ranging 
between 3 and 258 nodes). Up to three nested levels of 
partitioning were considered. 

Since the connectivity topology is determined through 
an experimental procedure jXj , it is critical to determine 
how stable is the outcome of the partitioning algorithm. 
Therefore, besides assessing the biological significance of 
the resulting community structure, we describe a proce- 
dure for evaluating the statistical robustness of the net- 
work partitioning results, based on systematically intro- 
ducing random errors in the connectivity topology. 



III. RESULTS 

Our procedure identified 15 top-level communities 
{Q = 0.6285) that were further subdivided into a total 
of 79 modules, during the recursive stage (Fig^ shows 
the tree structure of the resulting communities). The 
sensitivity of the resulting community structure to er- 
rors in the connectivity topology was evaluated by ran- 
domly adding or deleting a fraction (varying between 1% 
and 20%) of the edges and re-partitioning the graph ac- 
cording to the modified topology. Our results show that 
for a range of false-positive and false-negative connec- 
tions similar to those reported for this experimental data 
set I 111] , the outcome of the partitioning algorithm is very 
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ID 


#TFs 


# genes 


GO Annotation 


p- value 


U.U 






27 


415 


Amino acid biosynthesis, transport; cell growth 






0.0.0 




8 


96 


Amino acid biosynthesis/metabolism 


4.49e-22 






0.0.0.0 


1 


39 


Amino acid biosynthesis (ornithine) 


4.67e-15 






0.0.0.1 


2 


19 


Amino acid transport 


2.11e-08 






0.0.0.2 


2 


14 


Arginine biosynthesis 


2.46e-06 






0.0.0.3 


1 


14 


Branched chain family amino acid biosynthesis (leucine) 


3.59e-ll 






0.0.0.4 


2 


10 


Urea cycle intermediate biosynthesis 


6.75e-05 




0.0.1 




6 


67 


Carbohydrate transport, sterol transport 


7.25e-10 




0.0.2 




1 


55 


Carbohydrate metabolism 


3.05e-3 




0.0.3 




3 


51 


Iron ion transport, hexose transport 


4.43e-3 




0.0.4 




2 


46 


Response to metal ion 


8.17e-5 




0.0.5 




2 


38 


Nucleoside and nucleotide metabolism 


6.44e-3 




0.0.6 




3 


35 


Cell wall organization and biogenesis 


4.0e-4 




0.0.7 




2 


27 


Gluconeogencsis, carbohydrate biosynthesis 


1.55e-3 


U.i 






17 


405 


Cell cycle regulation 






0.1.0 




6 


138 


Cell cycle, DNA replication and chromosome cycle 


2.99e-7 




0.1.1 




3 


96 


Cell cycle, M phase 


l.Oe-4 




n 1 o 




3 


81 


Cell cycle, intracellular transport 


1.4e-4 




0.1.3 




3 


64 


Conjugation with cellular fusion, sexual reproduction 


1.05e-9 




0.1.4 




2 


26 


Response to stimulus, regulation of cell cycle 


6.32e-3 








31 


384 


DNA recombination, maintenance 






0.2.0 




6 


102 


Cytokinesis, completion of separation 


8.79e-5 




0.2.1 




4 


74 


Telomerase dependent telomere maintenance 


2.64e-7 




0.2.2 




8 


60 


Establishment and/or maintenance of chromatin architecture 


l.lOe-07 






0.2.2.0 


2 


17 


Chromatin assembly /disassembly, DNA packaging 


3.12e-ll 






0.2.2.1 


1 


17 


Cytoplasm organization and biogenesis 


2.23e-2 






0.2.2.2 


2 


9 


RNA processing 


6.18e-2 






0.2.2.3 


2 


8 


Unknown function 


n/a 






0.2.2.4 


1 


9 


Response to pheromone during conjugation 


1.45e-3 




0.2.3 




8 


42 


Mitotic recombination 


2.49e-3 








2 


43 


Cell organization and biogenesis 


9.97e-7 




n o 




2 


39 


ATP synthesis coupled proton transport 


1.17e-ll 




n o K 




1 


24 


Protein folding, response to stress 


7.3e-4 


U.o 






5 


253 


Protein biosynthesis, RNAs metabolism 


8.08e-9 


0.4 






25 


231 


Protein synthesis, transport and glycosylation 






0.4.0 




3 


38 


Cell cycle checkpoint 


2.320-2 




0.4.1 




1 


35 


Regulation of transcription 


5.54e-2 




0.4.2 




4 


29 


Protein polyubiquitination 


2.46e-2 






0.4.2.0 


1 


10 


RNA metabolism 


2.09e-2 






0.4.2.1 


1 


8 


Cofactor metabolism 


8.08e-3 






0.4.2.2 


1 


7 


Macromolecule catabolism 


1.43e-2 






0.4.2.3 


1 


4 


Nucleic acid metabolism 


1.32e-3 




0.4.3 




4 


27 


Regulation of protein biosynthesis 


3.91e-3 




C\ A A 




3 


27 


Glucose metabolism 


4.1e-4 




0.4.5 




2 


25 


Protein transport and localization (ER to Golgi) 


1.99e-2 




0.4.6 




5 


18 


Intracellular protein transport 


5e-4 




0.4.7 




2 


18 


Protein amino acid glycosylation 


5.5e-4 




0.4.8 




1 


14 


Golgi vesicle transport, protein localization 


9.35e-3 



TABLE I: List of functional modules obtained by partitioning the topology of the transcriptional network of S.cerevisiae. The 
annotation for each community was obtained from the Gene Ontology database. The significance of the enrichment is expressed 
by the p-value associated with the list of nodes in each module (with the most significant highlighted in red). 



consistent. Although a small percentage of nodes are 
re-distributed across different modules, the community 
structure is mostly preserved. 

Tables and ^] provide a description of each commu- 



nity, where the annotation and the significance of the 
enrichment (measured as a p-value) were obtained from 
the Gene Ontology (GO) database The number of 
genes and transcription factors present in each module 
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ID 


#TFs 


# genes 


GO Annotation 


p- value 


0.5 




Q 
O 


231 








0.5.0 


1 


1 1 

i lO 


UbiQuit in-dependent protein catabolism, peptidolysis 


^ A 
i.oe-^ 




0.5.1 


1 
1 


A fl 


Sporulation, spore wall assembly 


f ..Joe- 1 




0.5.2 


1 
1 


iU 


mRNA. catabolisni, deadenyl at ion- dependent decay 


z.ooe-o 




0.5.3 


I 


19 


Secretory pathway 


2.59e-3 




0.5.4 


^ 


14 


G 1-specific transcription in mitotic cell cycle 


o.oe-'± 




0.5.5 


3 


11 


SRjP- dependent protein-membrane targeting, translocation 


l.Oe-4 


0.6 




4 


189 






0.7 




y 




Lipid biosynthesis, degradation 






0.7.0 


1 


86 


Ergosterol metabolism, NADH metabolism 


4.12e-13 




0.7.1 


Q 

o 


OU 


Lipid biosynthesis, phospholipid metabolism 






0.7.2 


2 


21 


jr\._L JT aVliLlitJOlO L,W Ll jJlCll ClCL-LlUlI LI dllO UWl Lr 


1 22e-5 




0.7.3 




11 


\J lliS.litJWli 






0.7.4 


I 


6 


Pv*~iliTif> crl 1 it" n m ti't'O rvK^ta ri/~ilit;m 


2.54e-6 




0.7.5 


1 
1 


o 


Unknown 


n/a 


0.8 




1 Q 


1 Q1 


Aerobic respiration 






0.8.0 


q 
o 


/I fl 


Xy element transposition, alcohol metabolism 


Aa A 
4e-4 




0.8.1 


Z 


QO 


Organelle organization and biogenesis 


l.ooe-o 




0.8.2 


Q 
O 


OZ 


Sulfur metabolism, methionine metabolism 






0.8.3 


1 
1 




Purine nucleotide metabolism 


( .00€3-0 




0.8.4 




12 


Carboxylic acid metabolism 


6.83e-2 




0.8.5 


2 


9 
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TABLE II: List of functional modules {cont.) obtained by partitioning the topology of the transcriptional network of S. cerevisiae. 
The annotation for each community was obtained from the Gene Ontology database. The significance of the enrichment is 
expressed by the p-value associated with the list of nodes in each module (with the most significant highlighted in red). 

are also included. Each community is labeled according Fig. [21provides a global view of the organization structure 
to its nesting level: for example modules labeled {0.x. y.z} of the regulatory modules, 
are sub-communities of {O.x.y}, which are in turn sub- 
communities of {0.x}. Community {0} is the root module 
including all genes and regulators in the network. 

The following paragraphs provide a description of 
the top-level communities identified with the procedure. 
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A. Sample communities 

The amino-acid biosynthesis module (Fig. ^) is the 
largest of the top-level communities involving a total of 
417 genes and 25 TRs. The module includes a well- 
defined number of separate sub-structures associated to 
the synthesis of specific amino-acids (arginine, ornithine, 
glutamine and leucine), to amino-acid transport, and to 
the biosynthesis of intermediate products in the urea cy- 
cle. The community also includes several sub-modules re- 
lated to carbohydrate transport and metabolism, a result 
consistent with the known relationship between carbon 
source pathways and amino acid synthesis. 

Several cell division cycle genes {cdc5, cdc6, cdc7, 
cdc20, cdc39, cdc48) are among the key members of com- 
munity 0.1 (Fig.^) which is essentially associated to cell 
cycle related activities, including DNA replication and 
chromosome cycle as well as the regulation of the different 
stages of mitosis. Both Gl cyclins {clnl, dn2, and dnS) 
and B-type cyclins {clb2, clb4, clbS, and clb6), involved 
in activation of S, G2, and M phases of the cell cycle, are 
included among the genes in this module, together with 
several key players of Gl /S transition (swel, sapl85) and 
G2/M transition {Swi4, nddl). The partition also in- 
cludes a separate sub-community implicated in conjuga- 
tion with cellular fusion and sexual reproduction, which 
includes the genes mdgl, afrl and mfal (all involved 
in signal transduction during conjugation), jeml, scwlO, 
fusl and fusS (both regulated by mating pheromone and 
proposed to coordinate events required for fusion), gpal 
and the transcription factor Stel2 (activated by a MAPK 
signaling cascade), inducer of genes involved in mating. 

A particularly well-characterized community is the 
one implicated in DNA recombination and maintenance 
(Fig. ^), encompassing specific sub- modules related to 
the establishment and maintenance of chromatin archi- 
tecture {hhfl, hhf2, hhtl, hht2, htal, htbl), telomere 
maintenance {yrfl~7), and mating type specific regu- 
lators (Ashl, Hmlai, Hmla2, Matai, Mata2)- This 
large community (381 genes), also includes two large sub- 
networks: the first is involved in nucleoside phosphate 
metabolism {atpl, atp2, atp5, atpl4-, atplS, atplQ, and 
atp20), a precursor pathway of DNA molecule biosynthe- 
sis, while the second is linked to cytokinesis and comple- 
tion of separation {scwll, ctsl, chsl). 

Community 0.3 (Fig.^) comprises almost exclusively 
genes involved in different stages of protein biosynthe- 
sis and RNA processing, including pre-mRNA splicing 
{smx2, syf2, cwc2, prp4-, syfl), polyadenylation (ptal, 
ref2, fipl, rnal4-), and capping (cetl), rRNA process- 
ing {utp8, utpl4, rrp7, rrpl2, rrp^B, nhp2, pop4), and 
transport of the different RNAs {nup57, nup84, nupl70, 
nuplSS, gbp2, poml52). This module also includes sev- 
eral proteins which are implicated in vesicle-mediated 
transport, such as emp47, erv46, secl8, rerl, copl, 
ykt6, erv25 (Endoplasmic Reticulum to Golgi transport), 
vps33, pikl, ssol (Golgi to endosome or plasma mem- 
brane transport). 



Several important steps of protein metabolism in the 
cell can be assigned to different sub-structures of mod- 
ule 0.4 (Fig.n^)- In particular, the community includes 
genes associated to protein transport and localization, 
such as stp22, vps27, vps41, vps61, vps66, vps74 (pro- 
tein vacuolar targeting), paml6 (mitochondrial matrix 
protein import), nupl59 (mRNA-nucleus export), atg7 
and atg9 (protein autophagy) , srpl4 (protein- ER target- 
ing), or timl8 (protein-membrane targeting). Separate 
sub-communities are involved in protein amino-acid gly- 
cosylation {ktrS, hod, algl, mnnlO, pmt5) and protein 
ubiquitination {ubcll, ufd4). 

Closely related to the protein biosynthesis commu- 
nity is the module related to protein catabolism and the 
secretory pathway (Fig. ^). The largest of its differ- 
ent sub-structures (115 genes) is involved in ubiquitin- 
dependent protein catabolism, proteolysis and peptidol- 
ysis {rpt5, shpl, rpn2, preS, grrl, asiS). Other relevant 
sub-modules are those related to the secretory pathway 
{sec4, sec8, sec20, sec23, sec24, sec27, sncl, ypt7) and 
sporulation (spr5, pfsl, sspl, ditl, dit2, sps4, dtrl, gtsl). 

A module entirely dedicated to ribosome biogenesis 
and assembly is shown in Fig^. This community in- 
volves the vast majority of ribosomal proteins as well as 
all the genes involved in the assembly of the large and 
small ribosomal subunits. Community 0. 7 is function- 
ally associated to processes related to lipid metabolism, 
with several sub-modules spanning from lipid biosynthe- 
sis to lipid degradation in aerobic conditions. 

Community 0.8 and 0.9 are both associated to path- 
ways which are active in aerobic conditions, with a signif- 
icant sub-component involved in galactose metabolism. 
Finally, community 0.10 and 0.11 are both highly spe- 
cialized. The first includes several genes implicated in 
the response to chemical substances, drug transport, and 
response to DNA damage stimuli. The second is linked 
to oxidative stress response. 



B. Inter-community connectivity patterns 

Once a partitioning into communities is obtained, one 
can study how different types of regulatory modules are 
connected among each other. The relative density of 
edges running across the various communities provides 
an indication of the degree of co-regulation among mem- 
bers of different communities. Fig. |21 shows a map of 
such density of inter-connections, with the large shad- 
owed boxes enclosing the patterns of connectivity within 
each top-level community. A close examination of the 
highest edge densities reveals a number of significant pat- 
terns of connectivity among the discovered modules. Sev- 
eral of them associate the largest regulatory sub-network 
(amino acid biosynthesis. Fig. ^), with modules impli- 
cated in lipid metabolism and ATP synthesis coupled 
electron transport, in agreement with the known depen- 
dencies among these pathways. A detailed description of 
several highlighted co-regulation patterns is provided in 
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the caption of Fig. |3| 

IV. DISCUSSION 

Although the vast majority of the transcriptional sub- 
networks identified by the partitioning algorithm showed 
a significant functional coherence, we also found several 
examples of (often smaller) modules whose functional 
pertinence could not be easily determined {e.g. commu- 
nities 0.7.3, 0.7.5, 0.8.6, 0.11.5, and 0.11.6). In other 
cases, even when a general functional category could be 
assigned to a sub-module, the enrichment level of the 
resulting annotation was not deemed statistically signifi- 
cant (c/r. Tables III and ITl|l . At least two factors limit our 
ability of evaluating the results of the partitioning proce- 
dure: the first is the limited amount of information cur- 
rently available in ontology databases. The experimental 
procedure used to obtain the connectivity data involves 
several hundred genes whose molecular function or re- 
lated biological process is unknown. Moreover, even in 
those cases when a functional description is available for 
the associated nodes, the experimental procedure is likely 
to capture regulatory interactions that have not been 



previously observed. The presence of a non-negligible 
amount of false-positives and false-negatives in the data 
is also likely to affect the outcome of the module parti- 
tioning procedure. 

The method we employed for partitioning 
S.cerevisiae^s transcriptional network topology can 
be improved in several ways. The most limiting aspect 
of the current procedure is that nodes cannot be assigned 
simultaneously to multiple modules. When testing the 
robustness of the community finding procedure against 
errors in the connectivity topology we discovered that 
the nodes that are more likely to be re-assigned to a 
different module are those that are weekly connected to 
the original module. Typically, these nodes play a role 
of links among different sub-networks and should not be 
uniquely assigned to a single partition. A limitation both 
of the data currently available and of the partitioning 
algorithm is the inability of accounting for the stochastic 
nature of the connectivity topology. A more robust 
framework would be one where regulatory interactions 
are assigned a probability density function and nodes 
are assigned to modules according to a measure of 
likelihood. 
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FIG. 3: Map of the density of inter-community links. The darkest spots indicate a strong connection between two regulatory 
sub-modules, while the large shadowed boxes enclose the connectivity strengths among the 15 top-level communities. A set of 
relevant interactions are highlighted: strong co-regulation patterns appear between the the amino acid biosynthesis module and 
the lipid metabolism module (I), the aerobic respiration pathway (II), and the drug response system (III). The regulation of 
genes involved in resistance to arsenic compounds is behind the co-regulation pattern between the DNA maintenance module 
and the aerobic respiration module shown in (IV). 



